Using Background Contextual Knowledge for Document Representation
نویسندگان
چکیده
We describe our approach to document representation that captures contextual dependencies between terms in a corpus and makes use of these dependencies to represent documents. We have tried our representation scheme for automatic document categorisation on the Reuters’ test set of documents. We achieve a precision recall break even point of 84% which is comparable to the best known published results. Our approach acts as a feature selection technique that is an alternative to applying the techniques from machine learning and numerical taxonomy.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملDocument Clustering with Explicit Semantic Analysis (ESA)
Document clustering recently became a vital approach as numbers of documents on web and on proprietary repositories are increased in unprecedented manner. The documents that are written in human language generally contain some context and usage of words mainly dependent upon the same context; recently researchers have attempted to enrich document representation via external knowledge base. This...
متن کاملAround the Tables – Contextual Factors in Healthcare Coverage Decisions Across Western Europe
Background Across Western Europe, procedures and formalised criteria for taking decisions on the coverage (inclusion in the benefits basket or equivalent) of healthcare technologies vary substantially. In the decision documents, which display the justification of, the rationale for, these decisions, national healthcare institutes ma...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملKnowledge Management through Content Interpretation
The improved performance of computer-based text analysis represents a major step forward for knowledge management. Reliable text interpretation allows focus to be placed upon the content of documents, rather than just the document wrapping, and this helps to emphasise the fundamental difference between knowledge management and document management. It is not uncommon for companies who wish to jo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996